North Atlantic Ocean
Optimal Lower Bounds for Online Multicalibration
Collina, Natalie, Lu, Jiuyao, Noarov, Georgy, Roth, Aaron
We prove tight lower bounds for online multicalibration, establishing an information-theoretic separation from marginal calibration. In the general setting where group functions can depend on both context and the learner's predictions, we prove an $Ω(T^{2/3})$ lower bound on expected multicalibration error using just three disjoint binary groups. This matches the upper bounds of Noarov et al. (2025) up to logarithmic factors and exceeds the $O(T^{2/3-\varepsilon})$ upper bound for marginal calibration (Dagan et al., 2025), thereby separating the two problems. We then turn to lower bounds for the more difficult case of group functions that may depend on context but not on the learner's predictions. In this case, we establish an $\widetildeΩ(T^{2/3})$ lower bound for online multicalibration via a $Θ(T)$-sized group family constructed using orthogonal function systems, again matching upper bounds up to logarithmic factors.
- South America > Suriname > North Atlantic Ocean (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
Data-Driven Learnability Transition of Measurement-Induced Entanglement
Measurement-induced entanglement (MIE) captures how local measurements generate long-range quantum correlations and drive dynamical phase transitions in many-body systems. Yet estimating MIE experimentally remains challenging: direct evaluation requires extensive post-selection over measurement outcomes, raising the question of whether MIE is accessible with only polynomial resources. We address this challenge by reframing MIE detection as a data-driven learning problem that assumes no prior knowledge of state preparation. Using measurement records alone, we train a neural network in a self-supervised manner to predict the uncertainty metric for MIE--the gap between upper and lower bounds of the average post-measurement bipartite entanglement. Applied to random circuits with one-dimensional all-to-all connectivity and two-dimensional nearest-neighbor coupling, our method reveals a learnability transition with increasing circuit depth: below a threshold, the uncertainty is small and decreases with polynomial measurement data and model parameters, while above it the uncertainty remains large despite increasing resources. We further verify this transition experimentally on current noisy quantum devices, demonstrating its robustness to realistic noise. These results highlight the power of data-driven approaches for learning MIE and delineate the practical limits of its classical learnability.
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States (0.05)
- Asia > China > Anhui Province > Hefei (0.04)
- (3 more...)
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Lin, Jingyang, Wu, Jialian, Sun, Ximeng, Wang, Ze, Liu, Jiang, Su, Yusheng, Yu, Xiaodong, Chen, Hao, Luo, Jiebo, Liu, Zicheng, Barsoum, Emad
Recent long-form video-language understanding benchmarks have driven progress in video large multimodal models (Video-LMMs). However, the scarcity of well-annotated long videos has left the training of hour-long Video-LMMs underexplored. To close this gap, we present VideoMarathon, a large-scale hour-long video instruction-following dataset. This dataset includes around 9,700 hours of long videos sourced from diverse domains, ranging from 3 to 60 minutes per video. Specifically, it contains 3.3M high-quality QA pairs, spanning six fundamental topics: temporality, spatiality, object, action, scene, and event. Compared to existing video instruction datasets, VideoMarathon significantly extends training video durations up to 1 hour, and supports 22 diverse tasks requiring both short- and long-term video comprehension. Building on VideoMarathon, we propose Hour-LLaVA, a powerful and efficient Video-LMM for hour-scale video-language modeling. It enables hour-long video training and inference at 1-FPS sampling by leveraging a memory augmentation module, which adaptively integrates question-relevant and spatiotemporally informative semantics from the cached full video context. In our experiments, Hour-LLaVA achieves the best performance on multiple representative long video-language benchmarks, demonstrating the high quality of the VideoMarathon dataset and the superiority of the Hour-LLaVA model.
NeuroVascU-Net: A Unified Multi-Scale and Cross-Domain Adaptive Feature Fusion U-Net for Precise 3D Segmentation of Brain Vessels in Contrast-Enhanced T1 MRI
Vayeghan, Mohammad Jafari, Delfan, Niloufar, Masouleh, Mehdi Tale, Rizi, Mansour Parvaresh, Moshiri, Behzad
Precise 3D segmentation of cerebral vasculature from T1-weighted contrast-enhanced (T1CE) MRI is crucial for safe neurosurgical planning. Manual delineation is time-consuming and prone to inter-observer variability, while current automated methods often trade accuracy for computational cost, limiting clinical use. We present NeuroVascU-Net, the first deep learning architecture specifically designed to segment cerebrovascular structures directly from clinically standard T1CE MRI in neuro-oncology patients, addressing a gap in prior work dominated by TOF-MRA-based approaches. NeuroVascU-Net builds on a dilated U-Net and integrates two specialized modules: a Multi-Scale Contextual Feature Fusion ($MSC^2F$) module at the bottleneck and a Cross-Domain Adaptive Feature Fusion ($CDA^2F$) module at deeper hierarchical layers. $MSC^2F$ captures both local and global information via multi-scale dilated convolutions, while $CDA^2F$ dynamically integrates domain-specific features, enhancing representation while keeping computation low. The model was trained and validated on a curated dataset of T1CE scans from 137 brain tumor biopsy patients, annotated by a board-certified functional neurosurgeon. NeuroVascU-Net achieved a Dice score of 0.8609 and precision of 0.8841, accurately segmenting both major and fine vascular structures. Notably, it requires only 12.4M parameters, significantly fewer than transformer-based models such as Swin U-NetR. This balance of accuracy and efficiency positions NeuroVascU-Net as a practical solution for computer-assisted neurosurgical planning.
- South America > Suriname > North Atlantic Ocean (0.14)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- (6 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
WorldGen: From Text to Traversable and Interactive 3D Worlds
Wang, Dilin, Jung, Hyunyoung, Monnier, Tom, Sohn, Kihyuk, Zou, Chuhang, Xiang, Xiaoyu, Yeh, Yu-Ying, Liu, Di, Huang, Zixuan, Nguyen-Phuoc, Thu, Fan, Yuchen, Oprea, Sergiu, Wang, Ziyan, Shapovalov, Roman, Sarafianos, Nikolaos, Groueix, Thibault, Toisoul, Antoine, Dhar, Prithviraj, Chu, Xiao, Chen, Minghao, Park, Geon Yeong, Gupta, Mahima, Azziz, Yassir, Ranjan, Rakesh, Vedaldi, Andrea
We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts. Our approach transforms natural language descriptions into traversable, fully textured environments that can be immediately explored or edited within standard game engines. By combining LLM-driven scene layout reasoning, procedural generation, diffusion-based 3D generation, and object-aware scene decomposition, WorldGen bridges the gap between creative intent and functional virtual spaces, allowing creators to design coherent, navigable worlds without manual modeling or specialized 3D expertise. The system is fully modular and supports fine-grained control over layout, scale, and style, producing worlds that are geometrically consistent, visually rich, and efficient to render in real time. This work represents a step towards accessible, generative world-building at scale, advancing the frontier of 3D generative AI for applications in gaming, simulation, and immersive social environments.
- South America > Suriname > North Atlantic Ocean (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches
Moglia, Andrea, Leccardi, Matteo, Cavicchioli, Matteo, Maccarini, Alice, Marcon, Marco, Mainardi, Luca, Cerveri, Pietro
Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.
- Europe > Italy (0.04)
- North America > Canada (0.04)
- Oceania > Australia (0.04)
- (14 more...)
- Overview (1.00)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.45)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers
Tsunoori, Takeru, Kobayashi, Masato, Uranishi, Yuki
Abstract--Underwater robotic manipulation is fundamentally challenged by extreme lighting variations, color distortion, and reduced visibility. We introduce Bi-AQUA, the first underwater bilateral control-based imitation learning framework that integrates lighting-aware visual processing for underwater robot arms. Bi-AQUA employs a hierarchical three-level lighting adaptation mechanism: a Lighting Encoder that extracts lighting representations from RGB images without manual annotation and is implicitly supervised by the imitation objective, FiLM modulation of visual backbone features for adaptive, lighting-aware feature extraction, and an explicit lighting token added to the transformer encoder input for task-aware conditioning. Experiments on a real-world underwater pick-and-place task under diverse static and dynamic lighting conditions show that Bi-AQUA achieves robust performance and substantially outperforms a bilateral baseline without lighting modeling. Ablation studies further confirm that all three lighting-aware components are critical. Underwater robotic manipulation is uniquely challenging because the visual appearance of the same scene can change drastically within seconds under shifts in the spectrum, intensity, and direction of underwater lighting [1].
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
- South America > Suriname > North Atlantic Ocean (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
- South America > Suriname > North Atlantic Ocean (0.14)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (2 more...)
R3 and R4 note that assumption 2--the embedding method extracts the salient predictive information
We thank the reviewers for reading the paper and for their detailed comments. However, establishing such results is outside the scope of the paper. R3 and R4 both note that partially correcting for confounding may hurt. We have corrected the typos you point out (1. is anonymized ref, 2. is estimate of all nuisance params, 3. is sigmoid) We agree; this was a minor error on our part. Beyond the fact that it fixes the bug, we think this condition is more interpretable--thank you!